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The problem of signal detection using sparse, faint information 
is closely related to a variety of contemporary statistical problems, 
including the control of false-discovery rate, and classification using 
very high-dimensional data. Each problem can be solved by conduct- 
ing a large number of simultaneous hypothesis tests, the properties 
of which are readily accessed under the assumption of independence. 
In this paper we address the case of dependent data, in the context of 
higher criticism methods for signal detection. Short-range dependence 
has no first-order impact on performance, but the situation changes 
dramatically under strong dependence. There, although higher criti- 
cism can continue to perform well, it can be bettered using methods 
based on differences of signal values or on the maximum of the data. 
The relatively inferior performance of higher criticism in such cases 
can be explained in terms of the fact that, under strong dependence, 
the higher criticism statistic behaves as though the data were parti- 
tioned into very large blocks, with all but a single representative of 
each block being eliminated from the dataset. 

1. Introduction. 

1.1. Decision-making using sparse, faint information. Modern data ac- 
quisition routinely produces massive, complex datasets in many scientific 
areas, for example, genomics, astronomy and functional MRI. The need for 
fast and effective analysis in these settings poses challenging statistical prob- 
lems, one of which is how to reliably detect the presence of a sparse, faint 
signal. Here, data are available on a large number of observation units (or hy- 
pothesis tests, or transform coefficients, etc.), which may or may not contain 
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a signal; the signal, when present, is faint and is dispersed across different 
observation units (e.g., vector components) in an unknown fashion. 

The situation arises naturally in a range of application areas, for example, 
early detection of airborne bio-terror and syndromic surveillance (Donoho 
and Jin [7] and Anon [1]), covert communications (Donoho and Jin [7]) 
and non-Gaussian detection of the cosmic microwave background (CMB) 
(Cayon, Jin and Treaster [5] and Jin et al. [20]). In these examples the 
desired signal may represent the outbreak of certain disease, or covertly 
attached signals in a white noise channel, or non-Gaussian signatures in the 
CMB, and so on. In such cases the signal is either highly sparse, or faint in 
its individual presences, or both. The sparsity and faintness play an intricate 
duet which calls for nontraditional methods for signal detection. 

The signal detection problem is closely connected to that of multiple hy- 
pothesis testing. Indeed, if in the former setting we associate each obser- 
vation with a null hypothesis, which asserts that the signal is not present 
and the observation is pure noise, then the signal detection problem can 
be viewed as attempting to determine whether at least one of the null hy- 
potheses is false. From this point of view, related work has been done in, for 
example, problems of assessing the accuracy of random number generators; 
see, for example, Knuth [22]. Review-type accounts of multiple hypothesis 
testing include those of Hochberg and Tamhane [13], Pigeot [25], Dudoit, 
Shaffer and Boldrich [9], Bernhard, Klein and Hommel [3] and Lehmann and 
Romano ([23], Chapter 9). 

Of course, the problem of (1) discovering which null hypotheses are false 
is more difficult than that of (2) estimating the proportion of false null 
hypotheses, which in turn is more challenging than (3) determining whether 
at least one null hypothesis is false. Work on these respective problems 
includes that of (1) Benjamini and Hochberg [2], Genovese and Wasserman 
[11], Donoho and Jin [8], Efron et al. [10] and Storey, Dai and Leek [26]; 
(2) Swanepoel [27], Efron et al. [10], Cai, Jin and Low [4], Genovese and 
Wasserman [11], Storey, Dai and Leek [26], Jin [18], Jin and Cai [19], Jin, 
Peng and Wang [21] and Meinshausen and Rice [24]; (3) Donoho and Jin 
[7], Delaigle and Hall [6], Hah, Pittelkov and Ghosh [12], Ingster [14, 15], 
Jin [17] and Jager and Wellner [16]. 

1.2. Higher criticism methods for independent data. Inspired by ideas of 
Tukey [28] , Donoho and Jin [7] proposed higher criticism methods for signal 
detection in the presence of a white noise. The technique is based on assess- 
ing the statistical significance of the number of significant results in a long 
sequence of hypothesis tests, which can be either formal or informal. The 
principle has found a variety of applications, for example, to non-Gaussian 
detection (Cayon, Jin and Treaster [5] and Jin et al. [20]), goodness of fit 
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(Jager and Wellner [16]) and classification (Hall, Pittelkov and Ghosh [12] 
and Delaigle and Hall [6]). 

A higher criticism statistic, computed from standard normal data Xi to 
which might be added sparsely distributed, positive signals, can be defined 



where 5„ denotes a subset of the real line, <1> is the standard normal distri- 
bution function and ^ = 1 — If positive signals are added to the Xj's, then 
the numerator on the right-hand side of (1.1) takes relatively large values. 
Provided the size and number of added signals are sufficiently great, hc^ 
exceeds, with high probability, a critical point calculated under the assump- 
tion of "no signal." The size of this exceedance can be used as a basis for 
detecting the presence of the signals. 

1.3. The case of dependent data. It can be shown (see Delaigle and Hall 
[6]) that the main features of higher criticism do not alter under condi- 
tions of short-range dependence, for example, if the data Xi come from a 
moving-average process with exponentially decaying weights. However, the 
nature of the higher criticism statistic can change dramatically under strong 
dependence. Provided an appropriate critical point is employed to assess sig- 
nificance, higher criticism can still be used effectively in such cases, although 
its performance does not necessarily compare well with that of competing, 
difference-based signal detectors. Nevertheless, in order to be effective the 
latter approaches can require significant information to be known about the 
signal, and so their theoretical attractiveness may not necessarily be evi- 
denced in practice. 

To explore and elucidate these issues we shall treat dependent data gen- 
erated under a simple autocovariance model: 



where a > and denotes a positive sequence diverging to infinity. If Xi is 
a zero-mean, stationary Gaussian process with cov{Xi^,Xi.^) = pn{ii — h), 
then data values lagged in or more apart are independent. Therefore, by 
choosing in to diverge more slowly we reduce the range of dependence. 
Motivation for contexts such as this, and for the dependent data case more 
generally, is given in Section 1.4. 

We shall show that for large n, and to a first approximation, the numerator 
at (1.1) behaves like inJ2j{H^j > ~^(0}> where there are just n/in terms 
in this series and the Zj^s are independent and normal N(0, 1). That is, the 
higher criticism statistic behaves as though the data were partitioned into 
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blocks, and were identically equal to Zj within the jth block. (Here it 
is assumed that 1 < -^n < n.) Reflecting this property, the number of signals 
present should be increased by the factor ^n if performance is to be similar 
to that in the case of independence; we shall make this precise in Section 
3.4. This blockwise view provides considerable insight into properties of the 
higher criticism statistic, although it conceals the fact that, under strong 
dependence, the differences Xj+i — Xi can be used to construct powerful 
tests. 

1.4. Speckle imaging. One of the challenging aspects of earth-based as- 
tronomical imaging is removing the effects of atmospheric turbulence. The 
atmosphere is in constant motion, and it alters a point-like sharp signal 
to a blurred signal whose position changes, sometimes rapidly, over time. 
Speckle imaging involves creating many images, one after the other (e.g., 
thousands in an hour are possible using current technology), and combining 
these rapid "snapshots" to gain greater information (e.g., higher resolution) 
than could be managed using conventional imaging. A disadvantage of this 
approach is that it gives images with relatively low brightness, and so many 
short-exposure images are "stacked" to achieve greater sensitivity. There 
is a variety of ways of aligning stacked images, for example, by using the 
brightest point (a "speckle") as the locus. 

Of course, the brightest speckle moves relative to the point source in the 
heavens, for example, because of atmospheric effects and telescope vibration. 
Moreover, it can subdivide into more than one speckle, due to atmospherical 
effects. Therefore image stacking retains significant levels of noise, although 
it produces images which enjoy relatively high signal-to- noise ratios at high 
frequencies. While image stacking involves spatial, two-dimensional data, 
intellectually the issues are most transparently tackled in a one-dimensional 
setting, without making intrinsic changes. In this paper we analyze a one- 
dimensional model for the type of data obtained from image stacking, and 
report on the implications of the model. 

The stacking procedure results, first, in very faint point sources, often only 
a pixel or two wide and, in many cases, representing the same heavenly point 
source but located in different (perhaps multiple) positions in each image; 
and, second, in a background noise process that is defined essentially in the 
continuum, and whose correlation can extend over many pixels, especially 
if pixel width is small. In particular, the correlation between the noise at 
pixels i and j can be represented as 1 — g{\i — j\/n), where denotes pixel 
width and g is a smooth function. The autocovariance model at (1.2) is an 
idealized form of this. 

As technology improves, the distance between adjacent pixels becomes 
smaller (equivalently, the value of n becomes larger), and the number of im- 
ages in a stack increases (implying that the interpolated continuum model 
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in the previous paragraph apphes more closely). Work of Jin [17] uses higher 
criticism methods in this setting, but against the background (and the in- 
dependence assumptions) of Donoho and Jin [7]. The present paper takes 
a more careful look at what can occur in the case of dependence, and in 
particular reveals the suboptimality of higher criticism there. 

To extend these concepts to other settings, suppose we have data from 
a process Z which is a discretization (e.g., on a pixel grid) of a continuum 
process for which the autocorrelation function is 1 — 5, where g is smooth. As 
the discretization becomes finer, the strength of dependence of the discrete 
process increases, and the results developed in this paper become relevant. In 
the astronomical example, greater fineness is the result of improved imaging 
methods, but in other settings the cause may be different. 

In practice, a reasonable amount of information is known about the signal, 
and as in all imaging problems, further advice on choice of the critical point 
can be gained by visual experimentation. (Sometimes the differences between 
imaging problems and curve estimation are overlooked in this respect.) 

2. Overview. 

2.1. Time-series models and its immediate consequences. Let X„,2, • • • 
denote a stationary Gaussian process, with zero mean and autocovariance 
given by a slight specialization of (1.2): 

(2.1) p„(A;)=max(0,l-|A;|"n-"«), 

where a,ao > 0. That is, cov(X„jj , X^jj) = Pn{h — ^2) for each pair (11,^2). 
The length of the range of dependence of this process increases with both 
oq and a~^. 

Let K — olqIol. Then in (1.2), is effectively equal to n'^ . If we are 
considering crossings of levels proportional to ^logn, then, to a first ap- 
proximation, there is a high probability that any crossing is succeeded by 
"almost" "n!^ further such crossings, although not by n'^ crossings. To make 
this claim more concise we note that if t = \/2glogn, where g > 0, then 

(2.2) for each e>0, P(X„i > t for 2 < i < n'^-^lX^i > t) ^ 1. 
On the other hand, 

(2.3) P(X„i>tfor 2<i<ri''|X„i>t)^0. 

Result (2.2), under the side condition a > 2, follows from Theorem 3.2 in 
Section 3. Property (2.3) is simpler to derive; note that if X and Y are 
jointly normally distributed with zero means, unit variances and covariance 
p, then P{X > t|y > t) ^ as t ^ 00. 
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2.2. Blockwise decomposition of higher criticism statistic. Suppose we 
observe only the first n terms, that is, Xni, . . . ,Xnn, in the time series. 
Then (2.2) and (2.3) indicate that, if we are addressing level-crossings on a 
\/Togn scale, the sum of indicator functions that defines the higher criticism 
statistic can be written approximately as a sum of max(l,n^~'*) indicators 
of time-series variables lagged min(re,n'*) apart: 

n max(l, ■«-'-"'') 

(2.4) X^I(X„i>t)«min(n,n'^) ^ /(X„,,„«+i > t). 

i=l j=0 

Here we interpret jn'^ as j [n'^\ , where [x\ denotes the least integer not 
strictly less than x. We may consider Xnjn'^+i, appearing in the argu- 
ment of the indicator variable on the right-hand side of (2.4), as represent- 
ing the jth "block" of time-series values, that is, as representing the data 

-^n,jn'^+l} -^n,jn''+2': ■ ■ ■ > ^n,(j+l)n'= • 

The indicator variables on the right-hand side of (2.4) are independent 
and identically distributed as normal N(0,1), and so (2.4) implies that the 
higher criticism statistic is approximately equal to its version for the smaller, 
"effective sample size" of max(l, n^~^) , subsequently multiplied by the factor 
min(n,n''). This property gives insight, discussed in Section 2.3, into higher 
criticism for strongly dependent data. 

Even if block size is reduced from n'^ to n'^~^ , for some e > 0, then it 
is unreasonable to expect blockwise decompositions to hold uniformly in t. 
To appreciate why, consider slowly increasing the value of t until a point t' , 
say, is reached where, for a particular index j, at least one of the variables 
in the jih. block first fails to exceed the level t. Although the variables in 
the jth block are highly correlated, they have a proper joint distribution. 
Therefore, as we increase t beyond t' , the other variables in the block will fail 
one by one to exceed t. They will not fail simultaneously, although, since the 
correlation is high, they will, with high probability, all fail within a relatively 
small interval of values of t. 

2.3. Summary of properties of higher criticism statistic. We shall argue 
that behavior of the higher criticism statistic can be decomposed into two 
cases, "degenerate" and "nondegenerate." The degenerate case arises when 
K > 1. Here, in view of (2.2), the effective sample size, for crossings of a level 
on a ylogn scale, is just 1. Therefore, if k > 1, then the strongly dependent 
nature of the data effectively restricts us, when using the conventional higher 
criticism statistic, to working with a single data value. 

If K < 1, then the problem is nondegenerate, and (2.2) and (2.3) imply 
that the effective sample size is = n^~'^. In this case, if f > 1, then the 
probability that an exceedance of the level \/2v\ogN occurs, among the 
N independent data, converges to zero, and so we should confine attention 
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to levels for which v < 1. For simplicity, we skip discussion of the case of 
V = 1. Taking ^/2qTogn to be the level of the signal, and equating ^y2v logN 
to y/2qlogn, we see that "f < 1" translates to the condition "q < 1 — k." 
Therefore, we argue: 

Claim. In keeping with the approach developed for independent 
(2.5) data, when k < 1 the higher criticism statistic should be used to 
address crossings of levels no higher than ^/2qTogn, where q <1 — k. 

Theorem 3.4 will justify the claim. 

In the discussion above we treated the "null" setting, where no signal is 
present. In conventional higher criticism (Donoho and Jin [7]), the nonnull 
case is constructed by distributing n^~^ signals, each equal to ^/2rlogn for 
some r € (0,1), independently and uniformly among the n noise variables 
Xi. [To make the signal-detection problem nontrivial we take /3 E (^, 1).] To 
keep faith with the blockwise treatment suggested in Section 2.2, we should 
ideally distribute N^~^ blocks of signals, each block comprising n'^ signals 
and each signal equal to y/2rTogN where < r < 1, among the blocks. 
Of course, this does not happen; the blocks are fictions of our mathematical 
argument, and are not respected by the physical process that produces the 
signal. Nevertheless, as we shall show in Section 3.4, a close parallel with 
the case of independent data emerges if we add N^~^n'^ signals, distributed 
among the n time-series values Xni- 

It follows that, if K < 1 and we distribute N^~^n'^ signals randomly and 
uniformly among the n time-series values, then the higher criticism statistic 
for the time-series dataset Xni, ■ ■ ■ ,Xnn is well approximated by a constant 
multiple of its counterpart when the time-series is sampled only N times, 
once at every n'^ points, and the N^~^ signals are distributed among the 
N sampled points. The resulting subseries is comprised only of independent 
data. 

In the case of a strongly dependent time-series Xni , even if the magnitude 
of the signal is as small as n"*^^ for Ci > not too large, the presence of the 
signal can be determined accurately merely by deciding that the signal is 
present if, for some z € [l,n — 1], \Xn,i+i — Xni\ > n~'-^^, where < C2 < Ci. 
For this simple signal detector, the probability that the signal is not detected 
when it is not present, and the probability that it is detected when it is 
present, both converge to 1. However, effectiveness of the method requires 
information about the range of dependence. Details will be given in Theorem 
3.8. 

2.4. Signal detection using the maximum. A commonly used statistic for 
signal detection is the maximum of the observed data, Max„ = max(X„i, . . . , 
Xnn)- In the case of independent, N(0, 1) noise, the signal is deemed to be 
present if Max„ > \/21ogn, and not present otherwise; see, for example, 
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Donoho and Jin [7]. This "max classifier" has less sensitivity than higher 
criticism in the white noise setting, but is more robust to dependence. Details 
will be given in Section 3.4. 

3. Main results. 

3.1. Variance of numerator of higher criticism statistic. Recall that k = 
ao/a, where a,aQ> are parameters governing the autocovariance model 
at (2.1). Given two functions ai and 02 satisfying < ai{t) < a2{t) < 00, 
write {ai{t) , a2{t)) to denote a quantity which, uniformly in n > 1 and t > 1, 
is bounded between Ciai{t) and C2a2{t), for constants Ci,C2 > 0. Let ti = 
\t\ + l, for arbitrary real t. 

Our first theorem describes the variance of the argument of the higher 
criticism statistic, uniformly on the positive half-line. 

Theorem 3.1. If the autocovariance of the time-series Xni is given by 
(2.1), then there exist constants Bi,B2 > 1 such that, uniformly in n > 1 
and all t, 

(3.1) var|x:/(^n.>t)| = (ir''\if')^""'°^"+'''^e-*'/2. 

Remark 3.1 (Impact of blockwise properties on variance). Recall, from 
(2.4), that the numerator of the higher criticism statistic can be approx- 
imated by a construction where the terms in the numerator are grouped 
into blocks of length n'^, and the indicator functions that represent respec- 
tive blocks are independent. Referring to (2.4), and writing simply X for a 
random variable with the standard normal distribution, the variance of this 
approximation can be seen to equal 

max(l,n"^~'') var{min(n,n'')/(X > t)} 

(3.2) 

= n™'^(^+i'2)$(i){l - $(t)} = t-i)n'^''^(''+i'2)e-*'/2. 

The right-hand sides of the specific formula (3.1), and its approximation 

(3.2) , are close to one another. 

3.2. Blockwise properties of higher criticism statistic. Here we state re- 
sults that underpin (2.2), (2.4) and (2.5). Recall that K = ao/a. 

Theorem 3.2. Assume that the stationary, zero-mean Gaussian process 
Xni has autocovariance given by (2.1), and that < X < k. Then, for each 
r]>0, 

P{I{Xni >t)= I{Xni >t) forl<i<n^} = l- 0(n^-("«/2)+r,g-tV2)^ 

(3.3) 

uniformly in all t. 
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Since X < k can be chosen arbitrarily close to k, then Theorem 3.2, and 
related properties such as those at (2.3) and in Theorem 3.3, demonstrate 
that the process of indicator values, I{Xni > t), can be divided approxi- 
mately into blocks of length n'^. Here the approximation is logarithmic: the 
logarithm of block length is approximately equal to log(n'^). The difference 
in size of the higher criticism statistic, in the presence and in the absence 
of signals, respectively, is small on the same scale; the change in size is by 
a factor of n^, where e can be very small although always strictly positive, 
and depends on the distance that the signal lies above the detection bound- 
ary. See Theorem 3.7. Therefore, measurement of block size on a logarithmic 
scale is appropriate in the present setting. 

A corollary of Theorem 3.2 is that, if a > 2, which ensures that k — 
(ao/2) < 0; and if t„, is any sequence of positive constants such that tn/n^ — > 
for each > 0; then: 

for each e > 0, 

(3.4) 

sup |1 - P{Xni > t for 2 < z < n''"''|X„i > t)\ 0. 

t:\t\<U 

This is a strong form of (2.2). 

Result (3.4) also implies that if k > 1, then, with high probability, either 
all the data in the sample X^i, ■ ■ ■ , Xnn are above the level t, or all are below 
that level. Indeed, with t„ as before, 

(3.5) sup |1 - P{Xni > t for 2 < i < n\Xni > t)\ 0. 

t:\t\<t„ 

However, as (2.3) indicates, (3.5) fails when k = 1. 

Our next result quantifies the approximation at (2.4). We already know, 
from (3.5), that if k > 1, then (2.4) holds in the sense that, for J = or 1, 

(3.6) p\^^I{Xni >t) = nI{Xni > t)\I{Xni > t) = j| ^ 1, 

uniformly in |t| <tn- For k < 1 the approximation is a little more tricky, in 
that, as indicated at the end of the previous paragraph, block length cannot 
be quite as long as n'^ if the sum of indicator functions, on the left-hand 
side of (2.4), is to decompose into blocks with high probability. Moreover, 
when there is more than one block it is awkward to sharpen the result 
by conditioning, as at (3.6). Theorem 3.3, below, gives a concise account of 
subdivision into blocks of length [n^J , where X < k can be chosen arbitrarily 
close to K. 

Before stating Theorem 3.2 we give a little notation. Recall that k = ao/a. 
If K > 1, take 6 = 1 and = {1, . . . , n}. If At < 1 and A G (0, k), partition the 
integers l,...,n into b consecutive, adjacent blocks I3i,...,I3b, where the 
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first 6 — 1 blocks contain just [n J integers and the last block is of length 
between 1 and [n^\. (Thus, 6~n^~'^.) Let sj denote the least element of 

Theorem 3.3. // the assumptions in Theorem 3.2 hold, then, for each 
r]>0, 

P{I{Xni >t)= I{Xns, > t) for i E Bj and 1 < j < 6} 

(3.7) 

= 1 - 0(ni-("o/2)+''e-*'/2) 
uniformly in all t. 

Remark 3.2 [Blockwise clumping of indicator variables). If we add to 
the assumptions of Theorem 3.3 the condition qq > 0, then (3.7) implies 
that 

sup 1 1 - P{I{Xni >t) = I{Xns, > t) for i e Bj and 1 < j < b}\ ^ 

— oo<t<oo 

uniformly in all t. This result, and (3.4), underpin the blockwise decompo- 
sition of the higher criticism statistic, discussed in Section 2.2. 

Finally in this section we justify (2.5). We argue that if i = \/2qlogn, 
where q > 1 — k, then, in the majority of samples, none of the noise data 
exceed the level t. See (3.8) below. Therefore, if the signal is at level t or 
larger, and if it is added to the noise, then it will be very easy to detect. 
Hence, to make the signal-detection problem reasonably difficult, we should 
ensure that q <1 claimed at (2.5). 

Theorem 3.4. Assume that the stationary, zero-mean Gaussian process 
Xni has autocovariance given by (2.1), and that k<1. Then, ift = ^/2qlogn 
where q> 1 — k, 

(3.8) P{Xni > t for some 1 < i < n) ^ 0. 

3.3. Size of higher criticism statistic. Here we address properties of hc^, 
defined at (1.1), in cases where k < 1. Our analysis of blockwise characteris- 
tics of the higher criticism statistic has already shown that the case k > 1 is 
relatively uninteresting, since there, almost all the data Xni, . ■ . ,Xnn cross 
a given level t at the same time. 

We first address the nature of the set Sn in the definition of hc„. If the 
data Xni were independent and identically distributed, or equivalently, if 
K = 0, then we could take Sn = [—tn, tn] where tn = \/2slogn and < s < 1. 
Taking s > 1 is inappropriate, since the version of Theorem 3.4 that applies 
in the setting of independent data then implies that the level tn is hardly ever 
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exceeded by the data in a sample of n standard normal random variables; 
the level is too large. 

To remove this difficulty we should instead consider a value of t„ such 
that n^{tn) is bounded away from zero. A borderline sequence of this type 
is tn = {21ogn — log(Clogn)}^/^, where C is any positive constant. Here, 
n^{tn) converges to (C/vr)^/^. We know from the results discussed in Section 
3.2 that, when k < 1, the data behave as though they were partitioned into 
^i-ft i^IqcJ^s^ where the indicators I{Xni > t) are identical within the block. 
Therefore, when k < 1 we should ask that n^~'^$(t„), rather than n^(t„), 
be bounded away from zero. This appreciation motivates the assumption on 
tn imposed in the theorem below. 

Theorem 3.5. Assume that the stationary, zero-mean Gaussian process 
Xni has autocovariance given by (2.1), that k<1, and that t„ — > oo in such 
a manner that n^~'^^{tn) is bounded away from zero. Then, for all r] > 0, 

EiiHXni >t)- P{Xni > t)} 



(3.9) P 



sup 

\t\<tu 



{n$(t)$(t)}V2 



•0. 



Theorem 3.5 points to the size of critical point appropriate for a test of 
significance involving the higher criticism statistic, as follows. A multivariate 
central limit theorem for values of YliliXni > t), for a fixed but arbitrarily 
large number of different fs, implies that if the critical point Cn{a) is to 
satisfy 



(3.10) P 



Y.Al{Xm>t)-P{Xr,^>t)} ^ 
sup , , - , , > Cn[a) 



where < a < 1 is fixed, then c„(q) = n'^/^(i„(a), where dn{oi) diverges to 
infinity. (The central limit theorem can be proved using the method of mo- 
ments.) On the other hand, Theorem 3.5 shows that dn{ot) is no larger than 
0{n^), for any > 0. These considerations lead to the following corollary to 
Theorem 3.5. 

Theorem 3.6. If Xni is a stationary, zero-mean Gaussian process with 
autocovariance given by (2.1), i/0 < k < 1, — > oo such that n^~'^^[tn) 
is bounded away from zero, and if Cn{a) is defined by (3.10), then c„(a) = 
n'^^'^dnia) where, as n increases, dn{a) diverges to infinity for each fixed a, 
but equals 0{n^') for each rj > 0. 

Remark 3.3 (Calibration). In practice, a critical point would be deter- 
mined either by experience with the time-series Xni, or by simulation from 
a model for those data. In either case, Cn{ct) would, in effect, be found em- 
pirically. However, as Theorem 3.7 will show, it is not important that Cn(a) 
be determined particularly accurately. 
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Let < K < 1 and N = n^ as in Section 2, and consider the impact of 
adding a signal, equal to 



(3.11) u = y^2rlogN, 

to just N^~^n'^ of the standard normally distributed data Xni, ■ ■ ■ ,Xnn- 
This changes the data from Xni to 

(3.12) Yni — -^ni ~l" Ifii^i 

where Ini is a process of zeros and ones. This suggests that we add a signal to 
all n'^ time-series values in each of N'^~^ blocks. However, we do not need to 
add the signals in this blockwise way. They can be added in any deterministic 
manner, or in any random way that is stochastically independent of the 
time-series Xni- We take ^ < /5 < 1, to make the signal-detection problem 
relatively difficult. 

The detection boundary of Donoho and Jin [7] (see also Ingster [14, 15] 
and Jin [17]) is the locus of points (/3,r), with i < /3 < 1, such that 



(3.13) 




if i</5<!, 



if f </5<l. 



The theorem below shows that, if c„(q;) is the critical point defined by 
(3.10); and if we assert that the signal is present if the higher criticism 
statistic exceeds c„(a), or even if it exceeds a bound that is larger than that 
quantity to a small but fixed polynomial extent; then the probability that 
we make the correct decision when the signal is present, converges to 1 if 
(/9,r) lies above the boundary. Below we take = n^~'^, where < k < 1, 
and write hc^'^ for the version of hc„ when Xni, at (1.1), is replaced by 



ni ■ 



Theorem 3.7. Let Xni be a stationary Gaussian process with zero mean 
and autocovariance as at (2.1), and let Yni he as at (3.12), with the IniS 
independent of Xni, ■ ■ ■ ,Xnn and just N^^^n'^ of them equal to 1. Assume 
that n^~'^^{tn) is bounded away from zero, and (/3,r) lies strictly above the 
detection boundary. Then, for 6 > sufficiently small, 

(3.14) P(hc^^s > n(''/2)+'5) ^ 1. 



Define ^ = (1 — /3)(1 — k). If we consider that n^~^' = N^~^n'^ = n^+'' sig- 
nals, each of size ^/2r'logn = ^2r logiV, have been added to the time-series 
Xni, ■ ■ ■ ,Xnn, then {(3' ,r') is related to (/3,r-) by the formulas (3' = j3{l — k) 
and r' = r{l — k). Figure 1 graphs the detection boundary determined by 
(3.13), in terms of {(3' ,r') rather than (/3,r). 
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Fig. 1. Detection boundary graphed in terms of {f3',r') for different k. Recall that 
13' = {1 — n) P and r' = (1 — k) r. Each curve depicts the boundary shown at (3.14), with 
variables {13, r) rescaled to (P',r'). The curves, in order from top to bottom, are for the 
cases K = 0.6, 0.4, 0.2, 0, the last denoting the context of independent data. The horizontal 
dashed lines show graphs of r' — 1 — k, for the four respective values of k. 



Remark 3.4 [Performance of higher criticism when (/3,r) lies below the 
detection boundary]. An argument similar to that given in the proof of 
Theorem 3.7 may be used to prove that, if {(3,r) lies strictly below the de- 
tection boundary, and if the positive constants (i„ diverge at rate for each 
(5 > 0, although nevertheless sufficiently fast, then P(hc 

sig > n'^/^dn) 0. 

Therefore, the higher criticism statistic cannot be relied on to give accurate 
detection when r) lies below the detection boundary. 

If we are aware of the presence of strong dependence, and can exploit 
it through an accurate mathematical model for both the error process and 
the signal, then the dependence can be utilized to produce a signal detector 
with a high degree of sensitivity. To see why, consider adding a much smaller 
signal than before, this time of size only i/' = ib2(rn~"" logn)^/^ where r > 1, 
to between one and n — 1 points of the time-series Xni, . . . ,Xnm thereby 
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obtaining the data K^j, say. Let 1 <C <r, and suppose that we determine 
the signal to be present if 

max |y;,+i - y^| > 2(Cn-°° logn) V2. 

If this inequality fails, we determine the signal to be absent. We shall call 
this the "neighbor-difference detector" (NDD). In both conception and effect 
it is rather like assessing the time-series Y^^ using a high-pass filter. 

Our next result shows that NDD enjoys a high degree of sensitivity. De- 
pending on how much information is available about the signal and the noise 
process, NDD might require, relative to other methods, greater knowledge of 
the covariance at (2.1). The main issue in practice, however, is determining 
when the level of correlation has increased to such an extent that meth- 
ods based on the assumption of independence are no longer competitive. As 
noted in Section 1.4, this point can be reached relatively subtly, for example, 
as imaging technology improves through decreases in pixel size. 

Theorem 3.8. If I < C < r, then the probability that NDD correctly 
determines that a signal is present, given that it is, and the probability that 
the detector correctly determines that a signal is not present, given that it is 
not, both converge to 1 as oo. 

Note that, although Theorem 3.8 treats only one signal, that signal is 
permitted to be added at any number of points, between one and n — 1, and 
can be polynomially small in size, rather than logarithmically large as would 
be required in the case of higher criticism. Therefore, when using NDD the 
signal can be much smaller in size, and much less frequently present, than 
in the case of higher criticism; and nevertheless NDD manages to detect it. 

3.4. Properties o/Max„. The max classifier was defined in Section 2.4. 
In the case of independent N(0, 1) data, the rule given there leads to asymp- 
totically correct classification if r > (1 — y/1 — /3)^, and to correct classifi- 
cation with limiting probability ^ if r < (1 — ^/l — The resulting de- 
tection boundary, with equation r = (1 — ^/T^^)'^, coincides with that for 
higher criticism if | < /? < 1, but is above the higher criticism boundary if 
^ < /3 < |. However, unlike higher criticism, the maximum-based detector 
maintains its performance in the presence of a polynomially large amount 
of dependence. 

To appreciate this point, given /3 G (^,1) and a nonnegative sequence 
o„ decreasing to zero, let X{P,an) denote the set of all distributions of 
time-series Xni, ■ ■ ■ , Xnn for which each Xni is distributed as N(0,1), and 
the covariances pij = cov {Xni, Xnj) satisfy \pij\ < an for all i,j such that 
^ ^i,j and \i — j\ > n^/3. Define Yni by adding the sparse signal Inii^i 
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to Xni, as at (3.12). Here, to delineate the way in which performance of the 
max classifier does not vary with dependence, we take vi to equal the value 
that ly, in (3.12), would take in the absence of correlation: vi = y/2r logn. 
Compare (3.11). 

Theorem 3.9. If (3 £ (^, 1) and r > (1 — y/l — /3)^, then the probability 
that the max classifier correctly detects that a signal is present when it is, 
and the probability that the classifier does not detect a signal when it is not 
present, both converge to 1 uniformly in time-series in X{(3,an)- 

4. Numerical properties. We report here a simulation study assessing 
the performance of hc^. The analysis involved selecting pairs (r, /3) above 
the detection boundary, and generating samples from either population. To 
simulate a stationary Gaussian process Xni, ■ ■ ■ ,Xnn with the covariance 
function specified in (2.1), we used the method suggested by Wood and 
Chan [29]. 

Recall that k, = ao/a, r' = (1 — K,)r and /?' = (1 — k)/3, and that we obtain 
the process Yni, . . . , Ynn by adding to Xni, ■ ■ ■ , Xnn a total of n^~^ signals, 
each of strength ^/2r'logn. For the results reported here we selected two val- 
ues of (/3, r), and for each, studied the performance of hc„ for different values 
of ao- Specifically, we chose (/3,r) = (0.6,0.35) and (/3,r) = (0.75,0.5); each 
is a vertical distance 0.25 above the detection boundary at (3.13). For each 
{P, r) we treated n = 2^^ and n = 2'^^, a = 0.5 and ao = 0.05, 0.1, 0.15, 0.20, 
so that K = 0.1,0.2,0.3,0.4. We chose n large since the signals are highly 
sparse. 

Our project was implemented in the following sequence of steps: (1) Gen- 
erate the stationary Gaussian process {Xni, ■ ■ ■ ,Xnn} having the covariance 
function at (2.1). (2) Defining K = n^~^ , generate K variates from the uni- 
form distribution on the unit interval, ordering them as ui < n2 < • • • < uk', 
and generate a sequence Ini of zeros and ones, taking the value 1 at i = {nui) 
for i = 1,2, . . . , K , and the value elsewhere. (Here, (x) denotes the integer 
nearest to x.) (3) Put Yni = Xni + V2r' log n/„,j for 1 < i < n. (4) Construct 
hc„ from the data Xni and Yni, where in the definition of hc„, 5„ = (— tn, tn) 
with tn = ^~^{n'^~^). (5) Repeat steps (l)-(4) 100 independent times. The 
matlab code can be found at www.stat.purdue.edu/ jinj /Research/software/ 
HallandJin. 

For {(3,r) = (0.6,0.35), if we decide to reject the null hypothesis of "no 
signal" when n~''/^hc„, > 2.2, then, for n = 2^^, the empirical probability 
among 100 runs of committing a type I error equals 0.00 in the respec- 
tive cases ao = 0.05, 0.10, 0.15 and 0.20, and the corresponding empirical 
probabilities of committing a type II error are 0.00, 0.00, 0.00 and 0.06. In- 
creasing n to 2^*^, the respective type I error probabilities become 0.03, 0.00, 
0.00 and 0.01, and type II error probabilities decrease to 0.00 in each case. 
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Table 1 

Mean and standard deviation (SD) o/n""''^ he, 



n 




2io 


2l2 


2i4 


2l6 


2l8 


220 




Mean 
SD 


1.0273 
0.6234 


1.0504 
0.4290 


0.9162 
0.3810 


0.8851 
0.4086 


0.8467 
0.3702 


0.7823 
0.3805 



For (/3,r) = (0.75,0.5), if we reject the null hypothesis when n~'^/^hcn > 1.7, 
then, for n = 2^^, the type I error probabilities are 0.25, 0.02, 0.02 and 0.03, 
and the type II error probabilities are 0.04, 0.10, 0.21 and 0.41; they decrease 
to 0.17, 0.03, 0.04 and 0.00, and to 0.00, 0.00, 0.00 and 0.02, respectively, 
when n = 2^°. 

We also investigated how closely the values of hc^ accord to our asymp- 
totic analysis, which states that under the null hypothesis, hc^ = Op{n^'^/'^^~^^) 
and n^'^/^)"^ = Op(hc„) for each e > 0. We therefore conducted a simulation 
study where we fixed (ao,a) = (0.1,0.5), entailing k = 0.2, and took log2n 
in the range 10(2)20. For each n we generated a stationary Gaussian process 
with the covariance structure specified in (2.1), and calculated hc„. We then 
repeated the simulations 100 independent times. The results are tabulated 
in Table 1, which shows that as n increases from 10^ to 10^, the mean and 
standard deviation of n~''/^hc„ alter by only 24% and 39%, respectively, 
over the thousand- fold range of values of n. 



5. Technical arguments. 



5.1. Proof of Theorem 3.1. The case where t lies in a bounded inter- 
val is straightforward to treat, and the situation of large negative t can 
be addressed analogously to that for large positive t. Therefore we confine 
attention to t > 1. 

Let X and Y be jointly normally distributed with zero means, unit vari- 
ances and correlation coefficient p. Result (3.1) can be derived using the 
following lemma. 

Lemma 5.1. // 1 — > oo and p — > 1 in such a manner that t^{l- p) 0, 
then 

til - 

(5.1) l-P{X>t\Y>t)^ ^ ^^,1 ■ 



5.2. Proof of Theorem 3.2. For reasons that preface the proof of Theo- 
rem 3.1, it suffices to treat the case t > 1. 
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Lemma 5.2. Let Z^i, ■ ■ . , Zkmh^ for 1 < A; < denote normal random 
variables with zero means, unit variances and cov{Zki, Z^i) = Pki > for 
1 <i < ruk . Put m = nii + • • • + m£ and 

Pmin = min{/9fcj : 1 < i < ruk, l<k< i}. 

Then, uniformly in m>l and t>l, 

|1 - P{I{Zki >t)= I{Zki >t)forl<i< ruk and 1 < k < £}\ 

(5.2) 

where A is a positive absolute constant. 

Proof. Denote the left-hand side of (5.2) by LHS. Then, 

1 - LHS < 5] ^ P{I{Zk, > t) / I{Zki > t)} 

k=li=1 
k=\ i=2 

= 2{1 - cD(t)} ^ ^{1 - P(Zfe, > t\Zkx > t)}. 

k=l i=2 

Result (5.2), and hence Lemma 5.2, follows from this bound and (5.1). 

Next we derive Theorem 3.2. Given e £ (0, A), partition the set of integers 
in the interval into n"^~^ nonoverlapping subintervals, each of length 

n^, where the subintervals of integers are ordered as Xi, . . . ,T^\-e from left 
to right along the real line. (We shall omit integer-part notation, and also 
omit the straightforward treatment of the case where the last interval is a 
fragment, shorter than the other intervals.) Let ij denote the integer furthest 
to the left in Ij. Then, using Lemma 5.2, we deduce that uniformly in t > 1, 

P{I{Xni >t)= I{Xni^ > t) for i e Ij and 1 < j < n^-'} 

(5.3) 

= 1 - 0{n^(n^°/n"«)l/2e-*'/2} = l _ o(^A+(£a/2)-(ao/2)g-tV2)_ 

Suppose < Ai < A, and that we have partitioned the interval [l,n^] into 
j^A-Ai nonoverlapping subintervals, each of length n^^, arranged in order 
as J\, . . . ,J^\-\x from left to right; and that, with ij denoting the integer 
furthest to the left in Jj, 

P{l{Xni >t)= /(X„i, > t) for i e Jj and 1 < j < n^-^'} 

(5.4) 

= l-0(n^e-*'/2)^ 
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uniformly in t > 1, for a constant 7 > 0. Result (5.3) is an instance of (5.4), 
with Xi= s and 7 = A + {ea/2) — (ao/2). We shall argue by induction, from 

(5.3) via (5.4). 

Consider the sparsely sampled time-series Xnii , ■ ■ ■ , Xni ■ Partition 
the equally spaced integers zi, . . . , z^a-a^ , defined immediately above (5.4), 
into consecutive blocks /Ci, . . . ,/C„a-Ai-a2 , each containing n'^^ integers, where 
< A2 < A — Ai. Let ki denote the integer furthest to the left in ICe. Then, 
by Lemma 5.2, we have uniformly in t > 1, 

P{I{Xm^ >t)= I{Xnk, > t) for i G /C^ and 1 < £ < n^~^^~^^} 

(5.5) = 1 - 0{n^-^i (n^2"/n"o )i/2e-*'/2} 
= 1 - 0(n^-^i+(^2"/2)-("o/2)g-tV2)_ 

For each I, put Ci = UjgAC^ ■^i- This gives a partition of [0, n^] into n^-^^-^^ 
disjoint subintervals with !<(.< n^~^^~'^'^ , each Ci containing n^^^^'^ 
consecutive integers, and for which, in view of (5.4) and (5.5), the version of 

(5.4) holds with ij replaced by rrij (here denoting the least integer in Cj), 
Jj replaced by £j, n^~'^^ replaced by n'^"'^^"^^ ^nd 7 replaced by 

(5.6) 7' = max{7, A - Ai + i(A2Q; - ao)}- 
That is, uniformly in t > 1, 

P{I{Xni >t)= I{Xnm, > t) for i G and 1 < j < n^~^'~^^] 

(5.7) 

= l-0(n^'e-*'/2). 

Having achieved, in (5.3), result (5.4) for Ai = e and 7 = A + \{ea — ao), 
we may, noting the definition of 7' at (5.6), ensure that (5.4) holds with 
7' = 7, by choosing A2 such that 

A — Ai + ^(A2a — ao) = A + |(ea — ao); 

that is, A2 = e + (2Ai/a). Selecting this A2 implies that, in passing from 
(5.3) to (5.7), the number of subintervals \Ij in (5.3) and Cj in (5.7)] has 
decreased from n^~'^ to n^^, where 

^1 = A - Ai - A2 = A - e - Ai (^1 + = A - a;, 

with A'l =e + Ai(l + 2a~^) =e(l + T) and r = l + 2a"^ That is, (5.4) holds 
with Ai replaced by X'l but still with 7 = A + \ {£ol — ao). A further iteration 
of this argument decreases ^1 to 



e| 1 + - 

a 



e + -\e + e(l + - 
a I V a 



A-e(l + r + r^ 
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achieving ^fc = A — e(l + TH |-t'^) after a further k — 2 iterations. Stopping 

when ^fc < 0, we conclude that, uniformly in t > 1, 

P{I{Xni >t)= I{Xni > t) for 1 < i < n^} 

(5.8) 

= 1 - 0(n^+(W2)-(ao/2)g-iV2)_ 

Since e > is arbitrary, then (5.8) implies (3.3), establishing Theorem 3.2. 
□ 

5.3. Proof of Theorem 3.3. It suffices to treat the case t>l. Theorem 
3.2 implies that, uniformly in t > 1, 

max^P{/(X„i >t)= I(Xns, > t) for ieBj} = l- 0(?i^-("o/2)+»?g-tV2)_ 
Therefore, the probability on the left-hand side of (3.7) equals 

1 _ 0(5,,A-(ao/2)+r,g-tV2)^ 

uniformly in t. Theorem 3.3 follows from this property. 

5.4. Proof of Theorem 3.4. Let Z have the N(0, 1) distribution, and 
choose A < K so close to n that 1 — \ — q <Q. Write LHS for the left-hand 
side of (3.8). Then, using Theorem 3.3 to derive the inequality below, we 
have 

h 

LHS < PiXns, >t) + o(l) = bPiZ >t) + o(l) 
i=i 

= 0(ni"^e-*'/2) + o(i) = 0(ni"^"«) + o(l) ^ 0, 
which implies (3.8). 

5.5. Proof of Theorem 3.5. Theorem 3.4 implies that, unless q <1 — k, 
the level t = y/2q\ogn is too high if we are conducting inference for the 
process Xni, ■ ■ ■ , Xnn- Therefore we take q < 1 — At in Lemma 5.3, below. 

Lemma 5.3. Assume that the stationary, zero-mean Gaussian process 
Xni has autocovariance given by (2.1), and that k < 1. Let tn denote a 
sequence of positive constants diverging to infinity in such a manner that 
n^~'^^(tn) is bounded away from zero. Then, for each integer v>\, 

(5.9) E 



Y.{l{Xm >t)- P{Xm > t)} 



U=l 

uniformly in \t\ <tn- 



0[{n''+^<^{t)^t)y 



\2u 
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To complete the proof of Theorem 3.5, define 
= {n'^+i$(t).|(t)}V2^ 

n 

A(t) = Y.{I{Xni >t)- P{Xni > t)}, 
1=1 

and observe that, if 7^, is any subset of the values of t in the interval [—tn, tn] , 
and if > 0, then by Markov's inequality, 

P{|A(t)| yn'^dit) for some t e %} <n-^''''E( sup \A{t)/6{t)\ 

<n-2-''^i?{|A(t)/5(t)p-} 

t£T„ 

(5.10) 

<n-^'^^{#%,) snp E{\A{t)/S{t)f''} 
= 0{n-2-^(#T„)}, 

where the last identity follows from Lemma 5.3. Therefore, as long as Tn has 
no more than polynomially many elements, the following result holds: For 
all constants Ci,?] > 0, 

(5.11) -P{|A(i)l > n'^Sii) for some i G r„} = O(n-^i). 

By choosing the elements of Tn to be equally spaced on [—tn,tn], and 
selecting the spacing to equal n~^'^ , for C2 sufficiently large but fixed; and 
noting that k + 1 — q > and, uniformly in t S [—tn,tn], 

<5(t)>K+^-'?(logn)-n^/^ 
we deduce from (5.11) that 

(5.12) ^{|A(i)| >n'?(^(t) for some t G [-t„,t„]} ^0. 
This implies (3.9). 

Proof of Theorem 3.7. Recall that we distribute = N^~^ signals 
among the n time-series points, where ^ = (1 — /9)(1 — k). Let 

n 

Unit) = Y.{^iXni >t)- P{Xni > t)}, 
i=l 

n 

K(t) = E{^(^™ > - ^(^™ > 

i=\ 
n 

Wn{t) = Y^Im[I{Yni > t) - I{Xni > t) - {P{Yni > t) - P{Xm > t)}], 



i=l 
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where Ini = 1 if a signal is added at "time" point i, equaling zero if no signal 
is added there. Define ^i{t) = $(^) - $(i - u) and ^'2 = - ^'i). An 
argument similar to that used to derive Lemma 5.3 may be employed to 
show that, and uniformly in |t| <tm 



(5.13) 



The right-hand side of (5.13) is an upper bound, over all choices of the distri- 
bution of signal among the time-series data Xni as well as over all values of 
1^1 ^tn- The order of magnitude of the left-hand side is maximized when just 
N^~^ of the consecutive blocks of indices {1, . . . , n'^},{n'^ + 1, . . . , 2n^}, . . . 
are chosen to receive signals, and, for each of the chosen blocks, a signal is 
applied to each value of Xni the index of which is in that block. 

Result (5.13), Markov's inequality and the argument leading to (5.10) 
imply that if 7^ is any finite subset of values in the interval [—tn,tn], then 
for each rj > 0, 

P[\Wn{t)\ > n^+''{n«^2(t)}^/^ for some t £ %] 

Wn{t) 



< n 



sup 



o{n-2-''(#r„)}. 



{n€^2(t)}^/2 

This leads to the following analogue of (5.12): for each r] > 0, 
(5.14) P[\Wnit)\ > n^+''^{n^^2it)Y^^ for some t £ [-t„,t„]] ^0. 



Noting that Vn = Un + Wn, writing ^'3 
(5.14), we deduce that for each t] > 0, 



and combining (5.11) and 



P 



(5.15) 



\Vn{t)\ 



3.-1 *2(t) 



^3(t) 



1/2 



for some t G [—tn,tn] 



0. 



Note too that 



Unit) 



Y.JmE{I{Yni>t)-I{Xni>t)] 



{iVcI>3(t)}l/2 

{N^:,{t)Y/^ 



n 



2^+3k- 



-iMi: 

^3(t) 



2 ^ 1/2 



Result (3.14) will follow from this property and (5.15), provided we show 
that, for some > and uniformly in t G [— ini^n]) 



^3(t) 



^3(0 
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or equivalently, for some r/ > and uniformly in t G [—tnjtn], 

(5.16) ni/^ = 0{n^^i{t)}, = 0{n^^+^-^^i{tf^3{ty^}. 

Observe that n^~'^^'3(t) is bounded away from zero uniformly in |t| < t„, 
and so the first part of (5.16) holds provided the second part does. Now, 
the second part is equivalent to = 0[{n~'^Un{t)}'^] uniformly in t. If the 
point (/3, r) lies an amount e > above the standard detection boundary, at 
(3.13), then there exists 6 = S{f3,e) > 0, and t = t{n) £ [— such that 
n~'^Un{t) > const, for all n. Therefore, the second part of (5.16) holds if 
0<rj<6{(3,e). □ 

Proof of Theorem 3.8. Note that, for each i, Xn,i+i — Xni is normally 
distributed with zero mean and variance Therefore, the probability 

that the signal detector determines that the signal is present, given that it 
is not, is bounded above by 

2n[l - ${(2Clogn)^/2}] ^ 0{n^-^ {logny^^^} 0; 

and the probability that the signal is found by the detector to be present, 
given that it is, is bounded below by 

1 - ${(2Clogn)^/2 - (2rlogn)^/^} ^1. □ 
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